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Abstract 

Background: Microinversions are cytologically undetectable inversions of DNA sequences that accumulate slowly 
in genomes. Like many other rare genomic changes (RGCs), microinversions are thought to be virtually homoplasy- 
free evolutionary characters, suggesting that they may be very useful for difficult phylogenetic problems such as 
the avian tree of life. However, few detailed surveys of these genomic rearrangements have been conducted, 
making it difficult to assess this hypothesis or understand the impact of microinversions upon genome evolution. 

Results: We surveyed non-coding sequence data from a recent avian phylogenetic study and found substantially 
more microinversions than expected based upon prior information about vertebrate inversion rates, although this 
is likely due to underestimation of these rates in previous studies. Most microinversions were lineage-specific or 
united well-accepted groups. However, some homoplastic microinversions were evident among the informative 
characters. Hemiplasy, which reflects differences between gene trees and the species tree, did not explain the 
observed homoplasy. Two specific loci were microinversion hotspots, with high numbers of inversions that 
included both the homoplastic as well as some overlapping microinversions. Neither stem-loop structures nor 
detectable sequence motifs were associated with microinversions in the hotspots. 

Conclusions: Microinversions can provide valuable phylogenetic information, although power analysis indicates 
that large amounts of sequence data will be necessary to identify enough inversions (and similar RGCs) to resolve 
short branches in the tree of life. Moreover, microinversions are not perfect characters and should be interpreted 
with caution, just as with any other character type. Independent of their use for phylogenetic analyses, 
microinversions are important because they have the potential to complicate alignment of non-coding sequences. 
Despite their low rate of accumulation, they have clearly contributed to genome evolution, suggesting that active 
identification of microinversions will prove useful in future phylogenomic studies. 



Background 

Reconstructing the evolutionary relationships among 
organisms and changes in their genomes are major 
goals of phylogenomics [1-3]. The characteristics of gen- 
omes that have been used to reconstruct evolutionary 
history reflect the multitude of changes that arise due to 
distinct mutational mechanisms and accumulate at a 
variety of rates (Figure 1). The most slowly accumulat- 
ing changes, collectively designated rare genomic 
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changes (RGCs), reflect a heterogeneous set of muta- 
tional processes. RGCs include transposable element 
insertions (e.g., Kriegs et al. [4]), gene order changes [5], 
and additional less-studied phenomena [6-8]. Microin- 
versions [6] are one of these relatively poorly-studied 
types of RGCs. 

Despite this heterogeneity, RGCs are thought to 
exhibit less homoplasy (evolutionary convergence and 
reversals) than nucleotide substitutions [9]. Indeed, 
some RGCs have been viewed as "perfect" homoplasy- 
free (or virtually homoplasy-free) characters. Establish- 
ing that specific types of RGCs, like microinversions, 
are perfect characters is important for two reasons. 
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Figure 1 Approximate rates of accumulation for different genomic changes over evolutionary time. Details of the literature survey used 
to estimate these rates are provided in Additional file 2. The estimate of the avian microinversion rate reflects the results of this paper. Estimates 
of evolutionary rates for nucleotide substitutions and indels in birds appear lower than those for mammals, consistent with some previous 
publications [59], but it is important to note that substantial rate variation occurs within each group (e.g., [27,60]). As described in the text, it 
may be better to interpret prior estimates of the mammalian microinversion rate as the rate at which relatively long microinverslons accumulate. 



First, it would provide information about the muta- 
tional and evolutionary processes that underlie their 
accumulation, illuminating processes that contribute to 
genome evolution. Second, perfect RGCs could provide 
a practical means to assemble the tree of life because 
phylogenetic reconstruction is straightforward when 
homoplasy is absent [6]. 

Even perfect RGCs can appear homoplastic when 
found in genomic regions with an evolutionary history 
incongruent with the species tree [5,10]. The appearance 
of homoplasy due to incomplete lineage sorting, called 
hemiplasy [11], typically occurs in trees with short inter- 
nal branches [12,13]. However, rapid radiations with 
short internal branches ("bushes" or "biological big 
bangs") may be relatively common events in the tree of 
life [14,15]. This suggests that analyses of RGC data 
should consider hemiplasy explicitly. 

Microinverslons are defined as cytologically undetect- 
able inversions [6], although in practice the size range 
considered depends on the type of data examined and 
method used for detection. Feuk et al. [16] classified 
inversions ranging in size from 23 base pairs (bp) to 62 
megabases (Mb) as microinverslons, whereas Ma et al. 
[1] considered all inversions greater than 50 kilobases 
(kb) to be "large" inversions rather than microinverslons. 
The lower limit also varies, going down to 4 bp [17]. 
Not surprisingly, studies using whole genomes (e.g., 
[1,16]) have identified larger inversions, while phyloge- 
netic studies (often restricted to a single locus or region 
of an organellar genome) have typically revealed much 
smaller microinverslons (e.g., [17-21]). Nonetheless, the 
size spectra reported for genome-scale and phylogenetic 
studies overlap, suggesting that both types of studies 



include at least some inversions that result from similar 
biological phenomena. Using the term "microinversion" 
to refer to inversions that are long enough to include 
one or more complete genes seems inappropriate, sug- 
gesting that it should be reserved for shorter inversions. 
However, this criterion may be difficult to apply in prac- 
tice, since the length of genes exhibits substantial varia- 
tion among organisms and within genomes. The 
majority of genes are <50 kb in length in most verte- 
brate lineages, suggesting that the Ma et al. [1] size cri- 
terion may be appropriate and simple to use. Therefore, 
we recommend using 50 kb as the maximum size for 
microinverslons in most vertebrate genomes, although 
we also note that the most appropriate size criterion is 
likely to depend upon the focal organism. 

The hypothesis that microinverslons and other RGCs 
are perfect characters reflects both their large state space 
(number of potential character states) and slow rate of 
accumulation over evolutionary time, making indepen- 
dent changes to the same state unlikely. The state space 
for different RGCs will depend upon the details of each 
type of genomic change, but it seems likely that the state 
space for microinverslons is large; they can be of a variety 
of lengths and have any specific nucleotide for endpoints, 
making it unlikely that independent microinverslons will 
appear identical. Previous studies have also suggested 
that microinverslons accumulate at a very low rate 
(Figure 1), although this observation may be biased by 
the size spectrum of the inversions that were identified 
and considered to be microinverslons. Ma et al. [1] 
reported that smaller microinverslons (they identified 
inversions as short as 31 bp) occur more frequently than 
larger ones. However, the rate of accumulation for 
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inversions that are even shorter than those identified by 
Ma et al. [1] remains unclear and these differences 
among previous studies make direct comparisons chal- 
lenging. Nonetheless, it seems certain that microinver- 
sions accumulate at least several orders of magnitude 
more slowly than nucleotide substitutions. Thus, the 
hypothesis that microinversions are perfect characters 
that will be very useful for assembling the tree of life 
remains reasonable. 

The mechanism(s) responsible for microinversion 
accumulation remain poorly characterized, making 
empirical tests of the "perfect character hypothesis" for 
these relatively poorly studied RGCs critical. Indeed, 
homoplastic microinversions have been identified in 
angiosperm chloroplast genomes [17,19], in contrast to 
expectation based upon the perfect character hypothesis. 
Most chloroplast microinversions appear to be asso- 
ciated with palindromic sequences that have the poten- 
tial to form stem-loop structures in transcripts [17,19] 
and these palindromes may facilitate inversion. Indeed, 
Catalano et al. [21] reported that microinversions are 
correlated with higher stability of the hairpins that have 
the potential to form at these stem-loop regions, in 
agreement with the hypothesis that hairpin formation 
facilitates inversion. Since many chloroplast stem-loop 
structures have regulatory functions (e.g.. Stern et al. 
[22]) they are typically conserved, creating the potential 
for recurrent inversions at specific sites. Regulatory 
stem-loops are present in vertebrate introns (e.g., Hugo 
et al. [23]) and at least one vertebrate microinversion 
noted in a vertebrate phylogenetic study was associated 
with an inverted repeat [18]. However, conserved stem- 
loops appear to be uncommon in vertebrate introns 
whereas chloroplast stem-loops are relatively common 
[22,24]. This difference is consistent with the observa- 
tion that few animal microinversions appear homoplas- 
tic [6,25]. Indeed, all microinversions observed in those 
studies were either homoplasy-free or conflicted with 
short branches. Thus, the small number of animal 
microinversions that appear to conflict with the species 
tree based upon other data may result from hemiplasy 
rather than homoplasy. Thus, microinversions in animal 
nuclear genomes remain candidates for "ideal RGCs", 
able to recover branches in gene trees accurately. 

Microinversions can be difficult to identify, making 
the study of these interesting and phylogenetically useful 
genomic changes challenging. In fact, -80% of the inver- 
sions identified in the Feuk et al. [16] comparison of the 
human and chimpanzee genomes were later suggested 
to be contig assembly artifacts [6]. This problem can be 
solved by restricting the term microinversion to the 
shortest part of the inversion spectrum, limiting the 
maximum size of the microinversions to less than the 
length of an individual sequencing read (i.e., focusing on 



inversions that are <400 bp for Sanger sequencing). 
Comparing closely related taxa also has the potential to 
facilitate microinversion identification. Indeed, most 
microinversions identified in a comparison of four 
mammalian genomes were found in the two most clo- 
sely related taxa [1]. Here we use these strategies to 
identify microinversions in non-coding regions asso- 
ciated with 17 loci from 169 birds. We examined varia- 
tion among loci in the microinversion rate (hereafter 
abbreviated Xmi), identified phylogenetically informative 
and homoplastic microinversions, and found evidence 
that the number of microinversions has been underesti- 
mated in previous large-scale studies. 

Methods 

Sequencing, Alignment and IViicroinversion Identification 

We primarily used published data [26-28], although 
some novel CLTCLl sequences were generated using 
the primers and PCR conditions from Kimball et al. [29] 
(for details, see Additional file 1). For this study, we 
focused on shorter sequences with extensive taxon sam- 
pling (Table 1) instead of complete genomic sequences 
[26-28]. Sequences were aligned manually, sometimes 
starting from an alignment produced in an automated 
manner (i.e., using Clustal [30] or MAFFT [31]). Align- 
ments were refined iteratively with input from at least 
two different individuals. During this process alignments 
were examined carefully; this resulted in the identifica- 
tion of a number of microinversions "by eye" (Addi- 
tional file 2, Table S2). 

Microinversions were also identified by a computa- 
tional method that combined the multiple sequence 
alignments with the results of complementary strand 
alignments for all pairs of sequences (Additional file 2, 
Figure SI). The pairwise complementary strand align- 
ments were generated using bl2seq [32] and YASS [33] 
and mapped onto the multiple sequence alignments 
using a program written by ELB. This program saved a 
table that included the first and last positions of each 
pairwise complementary strand alignment in the multi- 
ple sequence alignment and highlighted the overlap- 
ping pairwise complementary strand alignments (an 
example is presented in Additional File 3 along with a 
description of the algorithm in pseudocode). Microin- 
versions are expected to result in complementary 
strand alignments that either overlap or are located 
near each other in the sequence alignment. The pre- 
sence or absence of microinversions at each position 
identified as a significant complementary strand hit 
involving sequences that were overlapping or located 
near each other in the multiple sequence alignment 
was then validated by visual inspection. Microinversion 
endpoints were assigned based upon the length of the 
complementary strand alignments, although there were 
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Table 1 Estimates of the microinversion rate (Xmi) ^or 
different loci 



Locus 


Chr" 


Mean 


Treelenqth 


# of 


Estimated 






Non- 


(iVlY)^ 


Inversions'^ 


Rate (Xmi) 






coding 






(inversions 






Length 






Mb MY ) 






(bp) 








CLTCL / 


1 5 


360 


8890 


5 


1 .58 


CLTC 


1 9 


1310 


9280 


1 9 


1 .56 


PCBDl 




800 


91 50 


5 


068 


HMGN? 


23 


1 340 


5400 


4 


0.55 


EEF2 


28 


1 21 0 


9230 




0.54 


IRF2 


4 


600 


9090 


2 


0.37 


GHl 


27 


1 030 


9090 


3 


0.32 


ALDOB 


Z 


1450 


8850 


4 


0.31 


TP Ml 


10 


450 


8090 


1 


0.28 


FGB 


4 


2070 


9360 


4 


0.21 


TGFB2 


3 


560 


9360 


1 


0.19 


CRYAA 


1 


930 


8740 


0 


0 


FGRl 


13 


490'' 


8970 


0 


0 


MB 


1 


680 


9190 


0 


0 


MUSK 


Z 


510 


8810 


0 


0 


MYC 


2 


620'' 


9240 


0 


0 


RHO 


12 


1190 


8990 


0 


0 


Overall 




15600 




54 


0.39 


Exclud 


ng 


13930 




30 


0.25 



hots pots'' 

" Chromosomal location in the chicken {Callus gallus). 
^ Sum of the branch lengths after rate smoothing in millions of years (MY). 
Divergence times were calibrated by assuming of a mid-Cretaceous (-100 
MY A} origin of Neoaves. Differences among loci reflect the amounts of 
missing data. 

The number of inversion events based upon the MP criterion. 

The non-coding portions of tv\/o loci {EGRl and MVO include 820 bp of 3' 
UTR. All EGRl non-coding sequence is 3' UTR and about half (330 bp) of MYC 
non-coding sequence is 3' UTR. 

CLTC and CLTCLl were excluded for this estimate. 

some cases where inversion endpoints were difficult to 
identify (e.g., Figure 2). Validating microinversions 
shorter than 5 bp was difficult, so that was the mini- 
mum size considered. 



The DNA mfold server (http://mfold.bioinfo.rpi.edu/cgi- 
bin/dna-forml.cgi; [34]) was used to search for stem- 
loop structures, and the MEME server (http://meme. 
sdsc.edu/meme4_4_0/intro.html) was used to search for 
sequence motifs that might be associated with 
inversions. 

Patterns and Rates of Microinversion Evolution 

Microinversions were coded as binary characters, and 
PAUP* 4.0bl0 [35] was used to calculate numbers of 
inversion events using maximum-parsimony (MP) and 
the Hackett et al. [27] topology. Xmi was expressed as 
microinversions Mb '^ MY''^ to facilitate comparison to 
other studies [6]. The null hypothesis of equal genome- 
wide microinversion rates was tested as described by 
Han et al. [36]. Briefly, a global Poisson model (which 
assumes equal genome-wide rates) was used as the null 
hypothesis, and the fit of that null model was compared 
to that of the more general negative binomial (NB) 
model (which permits variation in 'kMi) using a likeli- 
hood ratio test (LRT). See Additional file 2 for details. 

Phylogenetic Analyses 

Phylogenetic analyses of the CLTC alignment, con- 
ducted to provide an estimate of the CLTC gene tree, 
used RAxML 7.0.4 [37]. Microinversions and sites with 
gaps and/or missing data in more than 50% of taxa were 
excluded before conducting the RAxML search. See 
Additional file 2 for details. 

Results and Discussion 

IVIany Avian iVlicroinversions were Identified 

Manual and automated searches revealed that non-cod- 
ing regions associated with 11 of the 17 loci we exam- 
ined contained microinversions (e.g., Figure 2) ranging 
from 5 bp to 38 bp (Additional file 2, Table S2). Their 
median length was 22 bp. A number of the microinver- 
sions identified here were much shorter than those 
reported in genome-scale comparisons of mammals 



•Phaethon TTTCTGTCTG GTC G TC GTCTT||tGT|c|cTTg1||tG GT|tTTG|c|tC GtIMgIgTTG CTTTC 

• Caprimulgus TTTCTGTCTG GTTG Tc|tCTt|tTGt|c|tTTg|||tG g||tTTG G c|tC Gt||1g|gTG G CTTTC 
. Ciconia TTTCTGTCTG GTc|tTGTCTt||tGt|c|cTTg|||tG g||tTTG G c|tC GT||1g|gTTG CTTTC 
■Passer TTTCTGTCTG GTc|tTGTCTt||tGTG c|cTTg||cTG GtItTTG G c|tc|c||1g|gTTG CTTTC 
■ Faico TTTCTGTCTG GTC GTTGTCTtIIcGtIcIcTTgIHtGGcItTTGGcItcHIIIgIgTTG CTTTC 
. Leptosomus TTTCTGTCTG GTC G Tc|tCTT|1tGT|tG CTTgIHtG gMtTTG G c|tC GtIMgIgTTG CTTTC 
. Trogon TTTC G GTTTG GTC G TTGTCTtIc cMItTC cItTTCMgTGTItGTtItcItcHgMtTG CTTTC 

• Pharomachrus TTTCTGTTTG GTTGTTGTCTtHcItIcIcTTgIIItG gHtTTG G C GTc|tc||g|gTTG CTTTC 

• Brachypteracias TTTGTGTCt|gTc|tC GTCTt||tGt|c|cTTg|||tG g||tTTG G c|tc|t|||g|gTTG CTTTC 

(i) Inverted Sequence 

Trogon TTTC G GTTTG GTC GTTGTCTt||c| t|c|c t t g|||tgg||t t t g gTlTclTcllcllTTG CTTTC 

Pharomachrus IHCIGIIIG GIIGIIGICIlliC.fclclCiEIGlilTG gMtTTG G C GTc|tc||g|gIIG CHIC 

Figure 2 Example of a microinversion. (a) A conserved region in TPMl intron 6 with a 24 bp microinversion (outlined in white) in Trogon 
personatus. (b) Inverting the Trogon sequence (indicated in lower-case) results in a sequence identical to Pharomachrus auriceps, its sister taxon 
in the tree. 



Braun et at. BMC Evolutionary Biology 201 1, 1 1:141 
http://www.biomedcentral.eom/1 471 -2 1 48/1 1/141 



Page 5 of 10 



[1,16], where the smallest microinversions were 23 bp 
and 31 bp, respectively. Although it is possible that 
birds and mammals have distinct microinversion size 
spectra, it seems more likely that the large-scale surveys 
of mammalian data failed to identify the shortest 
microinversions. 

If ?LMi was similar in birds and mammals, fewer than 
four microinversions would be expected given the 
amount of sequence data examined; instead, microinver- 
sions were identified at 49 positions (Table 1). Ma et al. 
[1] reported that short inversions are more common 
than long inversions. If this pattern continues as micro- 
inversions become even shorter than those they identi- 
fied, the larger number of microinversions that we 
observed could reflect our identification of smaller 
inversions rather than any inherent difference between 
mammalian and avian genomes. The denser taxon sam- 
pling in our study, relative to whole genome studies in 
mammals, is also likely to have improved microinversion 
identification. Taken as a whole, our results suggest that 
previous studies that used mammalian data [1,6] under- 
estimated Xmi- 

The identification of microinversions can be difficult 
because point mutations and insertion-deletion events 
(indels) continue to accumulate after inversions. This 
has the potential to make ancient microinversions parti- 
cularly difficult, or impossible, to identify. Denser taxon 
sampling can help by increasing the number of 
sequences closely related to those with the microinver- 
sion and by providing multiple versions of the inverted 
sequence (Additional file 2, Figure SI). Although the 
taxon sampling for this study was denser than previous 
surveys that used mammalian data, computational 
searches for microinversions were difficult. Many com- 
plementary strand alignments were not validated as 
actual inversions; the false positives reflected palin- 
dromes and other phenomena. bl2seq performed better 
than YASS, producing fewer false positives while still 
identifying all of the microinversions also found by 
YASS. However, even after employing two computa- 
tional approaches, some microinversions were only iden- 
tified "by eye" (Additional file 2, Table S2), suggesting 
that further improvements to the methods used to iden- 
tify microinversions are required. 

Most microinversions were assigned to terminal 
branches in the Hackett et al. [27] phylogeny (Figure 3) 
when the MP criterion was used. This raises the ques- 
tion of whether an acquisition bias caused us to miss a 
number of ancient microinversions that occurred closer 
to the base of the tree. However, the structure of the 
avian tree of life is dominated by a rapid radiation at 
the base of Neoaves, the most speciose avian supergroup 
(identified in Figure 3), leading to a tree dominated by 
terminal branches. Indeed, 70.8% of the overall 



treelength in the Hackett et al. ML tree [27] comprises 
terminal branches. The number of microinversions 
observed on terminal branches was not significantly dif- 
ferent from expectation given the proportion of the tree 
that reflected internal and terminal branches (x = 3.0; 
P = 0.08). Thus, acquisition bias did not have a major 
impact upon our ability to identify ancient inversions. 

Avian Microinversion Rates Vary Among Loci 

Estimates of Xmi differ among loci (Table 1). The Pois- 
son model of microinversion accumulation (the null 
hypothesis) was rejected in favour of the NB model 
(which includes rate variation) using the LRT (25/«L = 
27.55; P < W^). Excluding the highest-rate loci [CLTC 
and CLTCLl) eliminated our ability to reject the Poisson 
model (28/«L = 2.29; P = 0.13) and reduced the >.mi 
estimate to 0.25 microinversions Mb'^ MY''^ (the value 
presented in Figure 1; 95% confidence interval of 0.17 - 
0.36). This suggests a "hotspot" model in which CLTC 
and CLTCLl are inversion-prone. However, even the 
lower estimate of X^i for "non-hotspot" loci greatly 
exceeded previous estimates of X^i, consistent with our 
hypothesis that the identification of microinversions, 
especially the shortest inversions, has been improved 
relative to prior studies. 

Surprisingly, both hotspot loci encode clathrin heavy 
chains, which are proteins critical for endocytosis [38], 
suggesting that the high microinversion rates could 
reflect their functional similarities. However, these cla- 
thrin heavy chain paralogs arose by duplication early in 
vertebrate evolution [39], and the homologous introns 
in CLTC and CLTCLl do not exhibit detectable 
sequence similarity. Although specific intronic motifs 
can be overrepresented in functionally related genes 
[40], motifs common to the CLTC and CLTCLl introns 
were not identified (data not shown). This suggests that 
it will be necessary to identify additional hotspot loci to 
understand the basis for inversion hotspots. 

Microinversions were absent in some loci (Table 1), 
but it is unclear whether this reflects stochastic variation 
or the existence of "coldspots". 3' UTRs are coldspot 
candidates because they exhibit a lower rate of sequence 
evolution than introns [29,41] and they are known to 
include regulatory elements [42]. Many of these regula- 
tory sequences are non-palindromic [43,44] and are 
unlikely to remain functional after inversion. Two to 
three microinversions were expected in our 3' UTR data 
(assuming equal rates for non-hotspot loci), but none 
were identified. We examined 3' UTRs from five addi- 
tional loci {ALDOB, CRYAA, EEF2, HMGN2, and 
PCBDl), four of which have intronic microinversions 
(Table 1), by examining 23 members of the avian order 
Galliformes [41]. A 36 bp microinversion is present in 
the Rollulus roulroul PCBDl 3' UTR, indicating that 
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Passeriformes 




□ 



Vidua 
Ploceus 

Passeriformes (2) 
Passeriformes (4) 
Picathartes 
Corvus 
Malurus 
Climacteris 
Menura 

Passeriformes (7) 
Pitta 

Smithornis 
Sapayoa 
Acanthisitta 
Psittaciformes (3) 
Chalcopsitta 
Platycercus 
Psittacus 
Cacatua 
Falconidae (4) 
Cariama 

Coraciiformes (6) 
Piciformes (Z) 
Piciformes (4) 
Bucorvus 
Tockus 
Phoeniculus 
Upupa 

Pharomachrus 
Trogon 
Leptosomus 
Col I us 
Urocolius 
Striglformes (4) 
Accipitridae (2) 
Pandion 
Sagittarius 
Cathartidae (2) 
Arenaria 
Jacana 
Rostratula 
Charadriiformes (2) 
Charadriiformes (2) 
Turnix 
Burhinus 
Phegornis 
Charadrius 
Haematopus 
Anhinga 
Phalacrocorax 
Morus 
Fregata 

Ciconiiformes f3} 
Ciconiiformes (2) 
Pelecanus 
Ciconia 

Procellariiformes (4) 
Oceanites 
Eudyptula 
Gavia 

Musophagiformes (2) 
Otididae (2) 
Cuculiformes (7) 
Gruiformes (7) 
Opisthocomus 
Cypselomorphae (10) 
Nyctibius bracteatus 
Nyctibius grandis 
Steatornis 

Rhynochetos+Eurypyga (2) 
Columbiformes (5} 
Mesitornis 
Monias 
Syrrhaptes 
Pterocles 

Phaethontidae _ 
Phoenicopterus 
Podiceps _ 
Megapodidae (2) 
Coturnix 
Gallus 
Rollulus 
Coiinus 
Numida 
Crax 

Anatidae (6) 
Anseranas 
Chauna 
Ratitae (3) 
Tinamus 
Crypturelius 
Tinamiformes (2) 
Rhea 
Struthio 

Figure 3 Microinversions indicated on the Hackett et al. [27]phylogeny. Inversions in introns are indicated with tick marks (blue for no 
homoplasy, green for the homoplastic inversions in CLTC intron 6, and red for the homoplastic inversions in CLTC intron 7). The 3' UTR inversion 
the PCBDl, which was obtained from selected galliform (see Results and Discussion), is indicated with a blue diamond. This mapping of 
character state changes assumes a reversal to the ancestral state in Psittaciformes for the CLTC intron 7 microinversion (indicated by an X over 
the red tick mark). An inversion in CLTCLl where Palaeognathae and Neognathae differ is shown along the root branch. Orders united by 
microinversions are indicated using names above the branch uniting them and brackets to the right. The order Galliformes is emphasized 
because 3' UTRs were sequenced from additional taxa in that order (see text). This phylogeny is presented as a cladogram because many 
internal branches are very short and this presentation makes it easier to locate the inversion events. For branch length information refer to 
Figure 3 in Hackett et al. [27] and the chronogram presented for this publication (Additional file 2, Figure S3). 
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these regions are not absolutely refractory to microin- 
versions. Thus, future surveys should include 3' UTRs to 
improve Xmi estimates for those regions and establish 
whether they exhibit among-locus rate variation similar 
to introns. 

Homoplastic and Overlapping Microinversions Exist 

Two microinversions in CLTC appeared homoplastic 
because the inverted forms were present in divergent 
lineages (e.g., Additional File 2, Figure S2). These homo- 
plastic microinversions required at least three (CLTC 
intron 6) or four (CLTC intron 7) changes on the Hack- 
ett et al. [27] phylogeny using the MP criterion to 
explain the observed distribution of character states 
(Figure 3). Errors in the phylogeny are unlikely to 
explain this observation, since the relevant branches are 
well supported (compare Figure 3 to Figure 2 of Hackett 
et al. [27]; also see Additional File 2, Figure S2). More- 
over, when these microinversions were mapped on other 
recent estimates of avian phylogeny using the MP criter- 
ion they require similar levels of homoplasy. These 
other estimates of phylogeny are based upon nuclear 
[26,45], mitochondrial [46-48], and morphological data 
[49,50], as well as expert opinion (e.g.. Figure 27.10 in 
Cracraft et al. [51] and Figure 5 in Mayr [52]). 

Hemiplasy is unlikely to explain the observed homo- 
plastic microinversions for two reasons. First, hemiplasy 
would require maintenance of polymorphic inversions 
over multiple, long internal branches (estimates of 
branch lengths are presented as a chronogram in Addi- 
tional File 2, Figure S3). Second, the estimate of the 
CLTC gene tree was not consistent with the microinver- 
sion distribution (Additional file 2, Figure S4), even in 
the single case in which branch lengths are short 
enough that hemiplasy is plausible. Thus, the CLTC 
inversions reflect genuine homoplasy, not hemiplasy, a 
novel finding for microinversions in animal nuclear 
genomes. 

In addition to the homoplastic microinversions in 
CLTC, we also found several overlapping microinver- 
sions (Additional file 2, Table S2). All of these overlap- 
ping microinversions reflected independent inversions in 
distinct lineages. We identified two overlapping microin- 
versions in CLTC and one in CLTCLl; the two overlap- 
ping microinversions in CLTC (INV-14 and INV-15; see 
Additional file 2, Table S2) also overlapped with one of 
the homoplastic microinversions in CLTC (INV-13). 
Thus, there were at least 12 inversion events in four 
specific regions of the two hotspot loci. There were also 
two additional overlapping inversions in low-rate loci 
{EEF2 and IRF2). Neither the homoplastic nor the over- 
lapping microinversions were associated with stem-loop 
motifs (e.g.. Additional file 2, Figure S4) or any other 
motifs that could be identified using MEME. These 



homoplastic and overlapping microinversions indicate 
that the actual state space for microinversions is likely 
to be smaller than their potential state space. 

Are Microinversions useful for Phylogenetics? 

Although the existence of homoplastic microinversions 
demonstrates that they are not perfect characters, they 
still have the potential to be useful phylogenetic mar- 
kers. The retention index of microinversions (RImi = 
0.949) given the Hackett et al. [27] tree is substantially 
higher than the retention index for nucleotide changes 

(Rlintron = 0.52, RIcoding exon = 0.54, RIuTR = 0.58). Such 

low amount of homoplasy suggests that an appropriate 
analytical approach (that accommodates homoplasy and 
hemiplasy) should yield an accurate species tree given a 
sufficient number of inversions. 

Branches at the base of Neoaves are very short and 
this radiation is a classic example of a "bush" phylo- 
geny [27]. In fact, the base of Neoaves has even been 
suggested to be a "hard" polytomy [53]. Hard poly- 
tomies reflect genuine multiple speciation events, so 
they cannot be represented as bifurcating trees. Even if 
Neoaves is a "soft" polytomy, many branches are likely 
to be <1 MY in length (Additional File 2, Figure S3; 
also see [26,45]). The low estimates of Xmi imply that 
microinversions will seldom occur along these short 
branches. How much sequence data would be neces- 
sary to resolve internodes of this length using microin- 
versions? Power analysis assuming 1 MY branch 
lengths using the rate estimate that excludes the hot- 
spot loci [54] indicates -1.2 Mbp of non-coding 
sequence per taxon is needed to find at least one infor- 
mative inversion and ~12 Mbp per taxon to identify an 
inversion on a specific branch (Additional file 2, Table 
S3). This estimate is orders of magnitude larger than 
the amount needed for of conventional analyses of 
sequence data (cf. Chojnowski et al. [26]). Moreover, it 
is desirable to identify multiple informative inversions 
along internodes given the potential for hemiplasy and 
homoplasy, suggesting that the use of microinversions 
as the sole source of information to estimate a phylo- 
geny similar to the avian tree of life would require 
even more data (Additional file 2, Table S3). 

Microinversions and Multiple Sequence Alignment 

The identification of microinversions is also important 
to ensure correct sequence alignment. Otherwise esti- 
mates of the amount of evolutionary change will be dis- 
torted, potentially resulting in incorrect phylogenetic 
estimation [19]. Algorithms for sequence alignment that 
include the possibility of inversions have been proposed 
[55-57], and they have the potential advantage of incor- 
porating explicit penalties for inversion events. However, 
the optimal inversion penalty to limit false positives may 
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be difficult to determine and the available algorithms are 
limited to the identification of non-overlapping microin- 
versions. Overlapping microinversions were found at 
four loci that we examined, suggesting that the inability 
to identify overlapping inversions may represent a major 
limitation. Overlapping and homoplastic microinversions 
can be divided into three basic categories (Additional 
file 2, Figure S6), and the strategy we employed should 
be able to detect two of these categories efficiently. The 
third category (type III in Additional file 2, Figure S6, 
which corresponds to the case of multiple homoplastic 
or overlapping inversion events on a single branch) is 
expected to be rare. It may be possible to overcome this 
problem in a multiple sequence alignment framework 
using a divide-and-conquer approach by selecting sub- 
sets of taxa for which overlapping microinversions are 
less likely to be present. This would necessitate a subse- 
quent assembly of the alignments. Moreover, such an 
approach might eliminate the benefits of dense taxon 
sampling. Despite these limitations, fully automated 
approaches could be less labour intensive than our 
approach. However, it is unclear whether microinversion 
identification can be fully automated since our results 
suggest that short microinversions may always require 
manual validation. Taken as a whole, these issues 
further emphasize the need to continue to improve algo- 
rithms for the detection and alignment of these interest- 
ing genomic changes. 

Conclusions 

These analyses demonstrate that the identification of 
microinversions is important, despite the relatively low 
rate of accumulation of these genomic changes. This 
study revealed that microinversions accumulate more 
rapidly in avian genomes than expected based upon 
prior analyses of mammalian genomes, although this dif- 
ference is likely to reflect the failure to identify very 
short inversions in the large-scale comparisons of mam- 
malian data. If this failure to identify short microinver- 
sion does explain the differences among this and 
previous studies, the estimates of X^i presented here, 
which are similar to the rate of accumulation of the 
most common type of avian TE insertion (Figure 1), 
may be more typical of vertebrate genomes. This likeli- 
hood that typical vertebrate Xmi values may be higher 
than suggested by previous studies emphasizes the 
importance of understanding the impact of microinver- 
sions upon genome evolution. We also documented the 
existence of microinversion hotspots, suggesting that 
some regions of the genome are especially prone to 
these mutations. The identification of additional hot- 
spots may provide information about the mechanistic 
basis of these mutations. Indeed, we were able to 
exclude one proposed mechanism, the existence of 



conserved stem-loops, based upon an examination of 
the inversion hotspots identified here. Despite our 
observation that microinversions can exhibit homoplasy, 
they are still relatively reliable RGCs and as such may 
define gene tree bipartitions more accurately than con- 
ventional sequence data (see Nishihara et al. [58]). In 
the future, analytical methods that integrate microinver- 
sions with sequence data and information about other 
RGCs (and incorporate the potential for both hemiplasy 
and homoplasy) will facilitate robust resolution of diffi- 
cult nodes in the tree of life and provide additional 
insights into the mechanism(s) responsible for their 
accumulation over evolutionary time. 
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